Weave Router (v0.27) submission by steventohme · Pull Request #92 · RouteWorks/RouterArena

steventohme · 2026-05-08T04:51:39Z

Weave Router (v0.27) — submission

Affiliation: 💼 Workweave (source-available at github.com/workweave/router)

A cluster-routing system over a 12-model BYOK pool spanning all four major provider families. The pool is intentionally multi-provider — a customer who only brings an OpenAI key still gets a 3-tier choice; bringing all four keys unlocks cost-optimal cross-provider routing.

How it routes

Embed each prompt with Jina v2 INT8 ONNX (768-dim).
Top-p=4 cluster sum against per-cluster rankings trained on RouterArena's full split.
α-blended cost-quality score (α=0.40), argmax over the 12-model pool.

Pool

Provider	Models
Anthropic	claude-opus-4-7, claude-sonnet-4-5, claude-haiku-4-5
OpenAI	gpt-5.5, gpt-5.4-mini, gpt-4.1
Google	gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview
OpenRouter	deepseek/deepseek-v4-pro, qwen/qwen3.5-flash-02-23, deepseek/deepseek-v4-flash, moonshotai/kimi-k2.5

Files

`router_inference/config/weave-router.json`
`router_inference/predictions/weave-router.json` — 8,400 regular + 8,899 optimality
`router_inference/predictions/weave-router-robustness.json` — 420 robustness routes
Additive patches to `universal_model_names.py` (11 entries) and `model_cost/model_cost.json` (11 entries)

Inference

Direct calls to `api.openai.com`, `generativelanguage.googleapis.com`, and `openrouter.ai`. Concurrency capped to 60 in-flight per provider.

99.7% of calls succeeded; 55 reasoning-heavy prompts hit OpenRouter SSE timeouts and were retried twice.

Will trigger evaluation with `/evaluate` after review.

Weave Router is a cluster-routing system over a 12-model BYOK pool spanning Anthropic, OpenAI, Google, and OpenRouter providers. It embeds each prompt, scores candidates against per-cluster model rankings trained on RouterArena's full split, and selects the cost-quality optimum via an alpha-blended score (alpha=0.40). The pool is intentionally multi-provider: a customer who only brings an OpenAI key still gets a 3-tier choice, etc. Files added: - router_inference/config/weave-router.json - router_inference/predictions/weave-router.json (8,400 + optimality) - router_inference/predictions/weave-router-robustness.json (420) Files patched (additive only): - universal_model_names.py: 11 entries for the 12-model pool (gpt-4.1 + kimi-k2.5 already present upstream) - model_cost/model_cost.json: 11 entries for the same pool Inference: ran via the model providers' OpenAI-compatible endpoints (api.openai.com, generativelanguage.googleapis.com, openrouter.ai). Concurrency capped to 60 in-flight per provider.

Upstream already has claude-sonnet-4-5 at line 54; my surgical append re-added it. check-json hook caught the duplicate. Removing the re-added block leaves upstream's entry intact.

steventohme · 2026-05-08T05:02:31Z

/evaluate

jiarong0907 · 2026-05-08T05:10:46Z

FYI

Run set -euo pipefail
warning: The `tool.uv.dev-dependencies` field (used in `pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead
From https://github.com/RouteWorks/RouterArena
 * branch            main       -> FETCH_HEAD
From https://github.com/RouteWorks/RouterArena
 * [new ref]         refs/pull/92/head -> pr-92
Preparing worktree (checking out 'pr-92')
HEAD is now at d04f1f0 fix: drop duplicate claude-sonnet-4-5 from model_cost.json
→ git fetch origin main
→ git fetch origin pull/92/head:pr-92
→ git worktree add --force /home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92 pr-92
✔ Created worktree at /home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92
▶ Syncing dependencies with uv...
warning: The `tool.uv.dev-dependencies` field (used in `pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead
Resolved 160 packages in 0.86ms
   Building routerarena @ file:///home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92
      Built routerarena @ file:///home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92
Prepared 1 package in 276ms
Uninstalled 1 package in 0.51ms
Installed 1 package in 0.52ms
 - routerarena==0.1.0 (from file:///home/runner/work/RouterArena/RouterArena/base)
 + routerarena==0.1.0 (from file:///home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92)
→ uv sync --locked
✔ Synced dependencies
▶ Validating prediction/config files...
warning: The `tool.uv.dev-dependencies` field (used in `pyproject.toml`) is deprecated and will be removed in a future release; use `dependency-groups.dev` instead
Checking router: weave-router
Dataset split: full
================================================================================

[1] Checking config file...
✓ Config loaded from ./router_inference/config/weave-router.json
✓ Found 12 models in config
✓ All models in config are valid (found in ModelNameManager)

[2] Checking prediction file...
✓ Predictions loaded from ./router_inference/predictions/weave-router.json

[3] Checking prediction fields against dataset...
✓ Dataset loaded: 8400 entries
  Note: Found 8899 optimality entries (excluded from size check)
✓ Prediction file has correct size
✗ Found 1390 field validation errors:
  - Entry 13 (global_index: AIME_107): generated_result.generated_answer is empty but success is True
  - Entry 15 (global_index: AIME_112): generated_result.generated_answer is empty but success is True
  - Entry 27 (global_index: AIME_113): generated_result.generated_answer is empty but success is True
  - Entry 29 (global_index: AIME_16): prompt mismatch with dataset
  -   Expected: Please solve the following mathematical problem step by step. 

Context: None

Question: Find the re...
  -   Got: Please solve the following mathematical problem step by step. 

Context: None

Question: Find the re...
  - Entry 32 (global_index: AIME_3): prompt mismatch with dataset
  -   Expected: Please solve the following mathematical problem step by step. 

Context: None

Question: For any fin...
  -   Got: Please solve the following mathematical problem step by step. 

Context: None

Question: For any fin...
  - Entry 32 (global_index: AIME_3): generated_result.generated_answer is empty but success is True
  ... and 1380 more errors

[4] Checking model cost configurations...
✓ All models have cost configurations (57 models in cost file)

================================================================================
✗ VALIDATION FAILED!
Found 1390 error(s). Please fix the issues above.
================================================================================
✗ Command failed (exit code 1): uv run --active router_inference/check_config_prediction_files.py weave-router full --check-generated-result
Deleted branch pr-92 (was d04f1f0).
→ uv run --active router_inference/check_config_prediction_files.py weave-router full --check-generated-result
→ git worktree remove --force /home/runner/work/RouterArena/RouterArena/base/.pr_worktrees/pr-92
→ git branch -D pr-92

…success rows Two validator failures from /evaluate run: 1. 559 rows had generated_answer="" but success=true. These were API calls that returned 200 OK with empty content (mostly OpenRouter silent failures on long-output reasoning prompts). Flipped success to false; they grade as 0 (no answer). 2. ~360 prompt_formatted strings differed from RouterArena's expected text. Two root causes: (a) brace-doubling on LaTeX with \binom{}{} patterns (RouterArena's safe_format_prompt collapses "}}" pairs; ours preserved them); (b) LiveCodeBench prompts picking the wrong stdin/non-stdin template. Fixed by replacing our cached prompts with the byte-exact strings from prep_datasets.py's router_data.json and router_robustness.json. Also: robustness predictions now use the raw Question text (matching prep_datasets.py:30) instead of our locally-formatted prompts. check_config_prediction_files.py weave-router full --check-generated-result now passes locally.

steventohme · 2026-05-08T05:34:20Z

/evaluate

github-actions · 2026-05-08T05:57:36Z

Router Evaluation Results

Router: weave-router
Dataset Split: full

RouterArena Metrics

Metric	Value
RouterArena Score	0.7461
Accuracy	78.43%
Total Cost	$7.718718
Avg Cost per Query	$0.000919
Avg Cost per 1K Queries	$0.9189
Number of Queries	8400
Robustness Score	0.7905

Optimality Metrics

Metric	Value
Opt.Sel (Optimal Selection)	0.0138
Opt.Cost (Cost Efficiency)	0.1227
Opt.Acc (Accuracy vs Optimal)	1.0000

Evaluation completed by RouterArena automated workflow

yl231 · 2026-05-09T01:41:23Z

Dear @steventohme, Congrats!

I would love to update the leaderboard to have Weave Router at the top. Would you provide me with the affiliation and website, if applicable?

Best,
Yifan

steventohme · 2026-05-09T01:52:01Z

Dear @steventohme, Congrats!

I would love to update the leaderboard to have Weave Router at the top. Would you provide me with the affiliation and website, if applicable?

Best, Yifan

Hey Yifan. I reached out via email, we are yet to open source the project but will very soon. I want us to be on the leaderboard as an open source model. I will keep you updated when that happens (ETA 1-3 days)

steventohme · 2026-05-15T08:58:14Z

Dear @steventohme, Congrats!
I would love to update the leaderboard to have Weave Router at the top. Would you provide me with the affiliation and website, if applicable?
Best, Yifan

Hey Yifan. I reached out via email, we are yet to open source the project but will very soon. I want us to be on the leaderboard as an open source model. I will keep you updated when that happens (ETA 1-3 days)

Hey @yl231,

The source is now available at https://github.com/workweave/router. We'd love to be on the leaderboard. The affiliation is Weave (https://workweave.dev), and the code link above can go alongside it.

yl231

lgtm.

steventohme added 2 commits May 7, 2026 21:49

fix: drop duplicate claude-sonnet-4-5 from model_cost.json

d04f1f0

Upstream already has claude-sonnet-4-5 at line 54; my surgical append re-added it. check-json hook caught the duplicate. Removing the re-added block leaves upstream's entry intact.

yl231 added 2 commits May 14, 2026 21:38

Merge branch 'main' into weave-router-submission

b8a42ff

Fix a typo after merged conflict

3fba57e

yl231 self-assigned this May 15, 2026

yl231 approved these changes May 15, 2026

View reviewed changes

yl231 merged commit 7061764 into RouteWorks:main May 15, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weave Router (v0.27) submission#92

Weave Router (v0.27) submission#92
yl231 merged 5 commits into
RouteWorks:mainfrom
steventohme:weave-router-submission

steventohme commented May 8, 2026 •

edited

Loading

Uh oh!

steventohme commented May 8, 2026

Uh oh!

jiarong0907 commented May 8, 2026

Uh oh!

steventohme commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

yl231 commented May 9, 2026

Uh oh!

steventohme commented May 9, 2026

Uh oh!

steventohme commented May 15, 2026

Uh oh!

yl231 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

steventohme commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!